Search CORE

43 research outputs found

Using gene expression profiles from peripheral blood to identify asymptomatic responses to acute respiratory viral infections

Author: A Grishin
A Statnikov
AK Zaas
Alexander Statnikov
CF Aliferis
CF Aliferis
CF Aliferis
CF Wright
Constantin F Aliferis
GY Chen
J Dresios
Jörn-Hendrik Weitkamp
KA Carlson
Lauren McVoy
Nikita I Lytkin
O Kepp
O Ramilo
RJ Schneider
RR Novoa
T Ohman
UM Braga-Neto
VN Vapnik
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background A recent study reported that gene expression profiles from peripheral blood samples of healthy subjects prior to viral inoculation were indistinguishable from profiles of subjects who received viral challenge but remained asymptomatic and uninfected. If true, this implies that the host immune response does not have a molecular signature. Given the high sensitivity of microarray technology, we were intrigued by this result and hypothesize that it was an artifact of data analysis. Findings Using acute respiratory viral challenge microarray data, we developed a molecular signature that for the first time allowed for an accurate differentiation between uninfected subjects prior to viral inoculation and subjects who remained asymptomatic after the viral challenge. Conclusions Our findings suggest that molecular signatures can be used to characterize immune responses to viruses and may improve our understanding of susceptibility to viral infection with possible implications for vaccine development.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis and Computational Dissection of Molecular Signature Multiplicity

Author: A Ploner
Alexander Statnikov
B Hammer
CF Aliferis
CF Aliferis
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
DL Gold
E Dougherty
F Azuaje
F Wagner
G Balazsi
G Natsoulis
I Guyon
I Tsamardinos
J Pearl
J Pearl
J Peña
J Shawe-Taylor
JP Ioannidis
L Ein-Dor
L Ein-Dor
L Li
LR Grate
M Hollander
P Roepman
RL Somorjai
S Michiels
S Ramaswamy
Scott Markel
SM Weiss
T Chu
TR Golub
TS Furey
X Qiu
Publication venue: Public Library of Science
Publication date: 01/05/2010
Field of study

Molecular signatures are computational or mathematical models created to diagnose disease and other phenotypes and to predict clinical outcomes and response to treatment. It is widely recognized that molecular signatures constitute one of the most important translational and basic science developments enabled by recent high-throughput molecular assays. A perplexing phenomenon that characterizes high-throughput data analysis is the ubiquitous multiplicity of molecular signatures. Multiplicity is a special form of data analysis instability in which different analysis methods used on the same data, or different samples from the same population lead to different but apparently maximally predictive signatures. This phenomenon has far-reaching implications for biological discovery and development of next generation patient diagnostics and personalized treatments. Currently the causes and interpretation of signature multiplicity are unknown, and several, often contradictory, conjectures have been made to explain it. We present a formal characterization of signature multiplicity and a new efficient algorithm that offers theoretical guarantees for extracting the set of maximally predictive and non-redundant signatures independent of distribution. The new algorithm identifies exactly the set of optimal signatures in controlled experiments and yields signatures with significantly better predictivity and reproducibility than previous algorithms in human microarray gene expression datasets. Our results shed light on the causes of signature multiplicity, provide computational tools for studying it empirically and introduce a framework for in silico bioequivalence of this important new class of diagnostic and personalized medicine modalities

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Expanding the Understanding of Biases in Development of Clinical-Grade Molecular Signatures: A Case Study in Acute Respiratory Viral Infections

Author: A Rangarajan
A Statnikov
A Statnikov
A Statnikov
A Statnikov
A Statnikov
AK Zaas
AK Zaas
Alexander Statnikov
AM Glas
C Ambroise
CF Aliferis
CF Aliferis
CF Aliferis
Constantin F. Aliferis
EE Ntzani
ER DeLong
F Azuaje
FJ Gonzalez
GG Jackson
I Guyon
I Guyon
I Tsamardinos
J Pearl
J Pearl
JA Sparano
JT Leek
Jörn-Hendrik Weitkamp
KA Baggerly
Lauren McVoy
LM Cope
Nikita I. Lytkin
O Ramilo
R Kohavi
R Simon
RA Irizarry
RA Irizarry
RL Somorjai
TW Anderson
UM Braga-Neto
Vladimir Brusic
VN Vapnik
WE Johnson
Y Benjamini
Y Benjamini
Z Liu
Publication venue: Public Library of Science
Publication date: 01/06/2011
Field of study

The promise of modern personalized medicine is to use molecular and clinical information to better diagnose, manage, and treat disease, on an individual patient basis. These functions are predominantly enabled by molecular signatures, which are computational models for predicting phenotypes and other responses of interest from high-throughput assay data. Data-analytics is a central component of molecular signature development and can jeopardize the entire process if conducted incorrectly. While exploratory data analysis may tolerate suboptimal protocols, clinical-grade molecular signatures are subject to vastly stricter requirements. Closing the gap between standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and developing strategies to avoid them.Using a recently introduced data-analytic protocol as a case study, we provide an in-depth examination of the poorly studied biases of the data-analytic protocols related to signature multiplicity, biomarker redundancy, data preprocessing, and validation of signature reproducibility. The methodology and results presented in this work are aimed at expanding the understanding of these data-analytic biases that affect development of clinically robust molecular signatures.Several recommendations follow from the current study. First, all molecular signatures of a phenotype should be extracted to the extent possible, in order to provide comprehensive and accurate grounds for understanding disease pathogenesis. Second, redundant genes should generally be removed from final signatures to facilitate reproducibility and decrease manufacturing costs. Third, data preprocessing procedures should be designed so as not to bias biomarker selection. Finally, molecular signatures developed and applied on different phenotypes and populations of patients should be treated with great caution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data

Author: A Bhattacharjee
A Butte
A Dupuy
A Potti
A Rosenwald
A Statnikov
A Statnikov
A Statnikov
Alexander Statnikov
AM Glas
B Freidlin
Bryan E. Shepherd
CF Aliferis
Constantin F. Aliferis
CX Ling
DG Beer
DJ Hand
EJ Yeoh
EL Lehmann
FE Harrell Jr
Frank E. Harrell
G Casella
Ioannis Tsamardinos
JA Sparano
Jonathan S. Schildcrout
JP Ioannidis
KK Dobbin
KK Dobbin
L Ein-Dor
L Shi
LA Habel
LJ van't Veer
M Saerens
MD Radmacher
ME Burczynski
MJ Marton
ML Lee
N Iizuka
P Baldi
PI Good
R Kohavi
R Simon
RE Fan
S Michiels
S Mukherjee
S Paik
S Paik
S Ramaswamy
SL Pomeroy
T Bammler
T Hastie
TR Golub
TS Furey
UM Braga-Neto
Vladimir B. Bajic
VN Vapnik
W Jiang
Publication venue: Public Library of Science
Publication date: 17/03/2009
Field of study

Critical to the development of molecular signatures from microarray and other high-throughput data is testing the statistical significance of the produced signature in order to ensure its statistical reproducibility. While current best practices emphasize sufficiently powered univariate tests of differential expression, little is known about the factors that affect the statistical power of complex multivariate analysis protocols for high-dimensional molecular signature development.We show that choices of specific components of the analysis (i.e., error metric, classifier, error estimator and event balancing) have large and compounding effects on statistical power. The effects are demonstrated empirically by an analysis of 7 of the largest microarray cancer outcome prediction datasets and supplementary simulations, and by contrasting them to prior analyses of the same data.THE FINDINGS OF THE PRESENT STUDY HAVE TWO IMPORTANT PRACTICAL IMPLICATIONS: First, high-throughput studies by avoiding under-powered data analysis protocols, can achieve substantial economies in sample required to demonstrate statistical significance of predictive signal. Factors that affect power are identified and studied. Much less sample than previously thought may be sufficient for exploratory studies as long as these factors are taken into consideration when designing and executing the analysis. Second, previous highly-cited claims that microarray assays may not be able to predict disease outcomes better than chance are shown by our experiments to be due to under-powered data analysis combined with inappropriate statistical tests

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Markov blanket-based method for detecting causal SNPs in GWAS

Author: A Hamosh
BA McKinney
Bing Han
C Kooperberg
C-c Chang
CF Aliferis
D Koller
D Margaritis
DF Easton
HJ Cordell
I Tsamardinos
I Tsamardinos
J Fellay
J Li
J Marchini
JH McDonald
JH Moore
JK Pritchard
LW Hahn
M Robnik-Šikonja
MD Ritchie
MD Ritchie
MD Shriver
Meeyoung Park
MY Park
P Spirtes
R Jiang
RJ Klein
RR Sokal
SE Antonarakis
SH Chen
SK Musani
ST Sherry
X-W Chen
Xue-wen Chen
Y Zhang
Publication venue: BioMed Central
Publication date: 01/04/2010
Field of study

Abstract Background Detecting epistatic interactions associated with complex and common diseases can help to improve prevention, diagnosis and treatment of these diseases. With the development of genome-wide association studies (GWAS), designing powerful and robust computational method for identifying epistatic interactions associated with common diseases becomes a great challenge to bioinformatics society, because the study of epistatic interactions often deals with the large size of the genotyped data and the huge amount of combinations of all the possible genetic factors. Most existing computational detection methods are based on the classification capacity of SNP sets, which may fail to identify SNP sets that are strongly associated with the diseases and introduce a lot of false positives. In addition, most methods are not suitable for genome-wide scale studies due to their computational complexity. Results We propose a new Markov Blanket-based method, DASSO-MB (Detection of ASSOciations using Markov Blanket) to detect epistatic interactions in case-control GWAS. Markov blanket of a target variable T can completely shield T from all other variables. Thus, we can guarantee that the SNP set detected by DASSO-MB has a strong association with diseases and contains fewest false positives. Furthermore, DASSO-MB uses a heuristic search strategy by calculating the association between variables to avoid the time-consuming training process as in other machine-learning methods. We apply our algorithm to simulated datasets and a real case-control dataset. We compare DASSO-MB to other commonly-used methods and show that our method significantly outperforms other methods and is capable of finding SNPs strongly associated with diseases. Conclusions Our study shows that DASSO-MB can identify a minimal set of causal SNPs associated with diseases, which contains less false positives compared to other existing methods. Given the huge size of genomic dataset produced by GWAS, this is critical in saving the potential costs of biological experiments and being an efficient guideline for pathogenesis research.</p

Crossref

Directory of Open Access Journals

KU ScholarWorks

PubMed Central

Approaches to working in high-dimensional data spaces: gene expression microarrays

Author: A Dupuy
A Statnikov
AK Jain
B Efron
BJ Frey
C Lai
CF Aliferis
D J Miller
D Miller
DB Allison
DF Ransohoff
DF Ransohoff
EP Xing
GV Trunk
I Guyon
I Guyon
J Novovicova
J Wang
JA Swets
JD Storey
KA Shedden
KY Yeung
L Ein-Dor
MW Graham
R Clarke
R Clarke
RO Duda
S Ramaswamy
T Lange
TR Golub
VN Vapnik
Y Wang
Z Wang
Publication venue: Nature Publishing Group
Publication date
Field of study

This review provides a focused summary of the implications of high-dimensional data spaces produced by gene expression microarrays for building better models of cancer diagnosis, prognosis, and therapeutics. We identify the unique challenges posed by high dimensionality to highlight methodological problems and discuss recent methods in predictive classification, unsupervised subclass discovery, and marker identification

Crossref

PubMed Central

Mining expressed sequence tags identifies cancer markers of clinical interest

Author: A Aouacheria
A Cromer
AG Bader
B Vogelstein
BJ Quade
BR Zeeberg
C Cortes
CF Aliferis
CL Nutt
CM Perou
DR Rhodes
ET Munoz
Fabien Campagne
G Dennis Jr.
GP Donovan
GS Sellick
HK Lee
IB Rosenwald
JC Darnell
KF Manly
L Dyrskjot
L Skrabanek
LJ van 't Veer
Lucy Skrabanek
M Unoki
MJ Clemens
MS Boguski
R Aebersold
R Edgar
R Simon
RB Darnell
S Mukherjee
S Ramaswamy
SL Pomeroy
T Joachims
TJ MacDonald
TM Chu
TM Chu
VE Velculescu
W Liu
YT Chen
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Gene expression data are a rich source of information about the transcriptional dis-regulation of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers. RESULTS: We present an approach to mine expressed sequence tags to discover cancer biomarkers. A false discovery rate analysis suggests that the approach generates less than 22% false discoveries when applied to combined human and mouse whole genome screens. With this approach, we identify the 200 genes most consistently differentially expressed in cancer (called HM200) and proceed to characterize these genes. When used for prediction in a variety of cancer classification tasks (in 24 independent cancer microarray datasets, 59 classifications total), we show that HM200 and the shorter gene list HM100 are very competitive cancer biomarker sets. Indeed, when compared to 13 published cancer marker gene lists, HM200 achieves the best or second best classification performance in 79% of the classifications considered. CONCLUSION: These results indicate the existence of at least one general cancer marker set whose predictive value spans several tumor types and classification types. Our comparison with other marker gene lists shows that HM200 markers are mostly novel cancer markers. We also identify the previously published Pomeroy-400 list as another general cancer marker set. Strikingly, Pomeroy-400 has 27 genes in common with HM200. Our data suggest that a core set of genes are responsive to the deregulation of pathways involved in tumorigenesis in a variety of tumor types and that these genes could serve as transcriptional cancer markers in applications of clinical interest. Finally, our study suggests new strategies to select and evaluate cancer biomarkers in microarray studies

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset

Author: AJ Hartemink
B Schölkopf
C Cotta
C Cotta
CF Aliferis
CG Peter Spirtes
Chickering
Cédric Auliac
D Chickering
D Heckerman
D Husmeier
D Pe'er
DE Goldberg
DG DM Chickering
E Segal
EP van Someren
Florence d'Alché-Buc
Friedman
FV Jensen
GF Cooper
H de Jong
I Tsamardinos
Imoto M Goto
J Cheng
J Pearl
JH Holland
JM Pena
JW Myers
KAD Jong
M Quach
ML Wong
N Friedman
N Friedman
P Giudici
P Larranaga
P Larranaga
P Spirtes
PP Le
R Etxeberria
R Robinson
RG Cowell
SW Mahfoud
T Gärtner
T Kocka
T Verma
Vincent Frouin
W Buntine
WH Hsu
Xavier Gidrol
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge. Results We proposed various evolutionary strategies suitable for the task and tested our choices using simulated data drawn from a given bio-realistic network of 35 nodes, the so-called insulin network, which has been used in the literature for benchmarking. We assessed the inferred models against this reference to obtain statistical performance results. We then compared performances of evolutionary algorithms using two kinds of recombination operators that operate at different scales in the graphs. We introduced a niching strategy that reinforces diversity through the population and avoided trapping of the algorithm in one local minimum in the early steps of learning. We show the limited effect of the mutation operator when niching is applied. Finally, we compared our best evolutionary approach with various well known learning algorithms (MCMC, K2, greedy search, TPDA, MMHC) devoted to BN structure learning. Conclusion We studied the behaviour of an evolutionary approach enhanced by niching for the learning of gene regulatory networks with BN. We show that this approach outperforms classical structure learning methods in elucidating the original model. These results were obtained for the learning of a bio-realistic network and, more importantly, on various small datasets. This is a suitable approach for learning transcriptional regulatory networks from real datasets without prior knowledge.</p

HAL Evry

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL-CEA

Hal-Diderot

Multiclass classification of microarray data samples with a reduced number of genes

Author: A Alizadeh
A Berger
A Dupuy
A Statnikov
A Statnikov
AI Su
C Ambroise
C Furlanello
CE Shannon
CF Aliferis
DJC Mackay
DK Slonim
E Tapia
EL Allwein
Elizabeth Tapia
F Azuaje
F Masulli
FR Kschischang
G James
G Salton
I Guyon
I Shmulevich
I Tsamardinos
I Witten
J Fan
J Hadar
J Khan
J Zhu
JE Staunton
K Yeung
KH Liu
L Breiman
Laura Angelone
Leonardo Ornella
M Dettling
M Hollander
MA Delgado
N Cristianini
Pilar Bulacio
R Rifkin
R Rifkin
RM Fano
S Dudoit
S Huang
S Lee
S Pomeroy
T Abeel
T Furey
T Li
TG Dietterich
TM Cover
V Guruswami
V Vapnik
X Qiu
Y Lin
Y Saeys
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

CONICET Digital

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Gene selection for classification of microarray data based on the Bayes error

Author: A Ben-Dor
A Statnikov
AA Alizadeh
AL Blum
AR Webb
C Ambroise
C Ding
C Gentile
C Lai
C Lee
CF Aliferis
CH Ooi
D Singh
E Xing
EK Tang
F Goudail
G Carneiro
G Kohavi
GR Xuan
HC Peng
Hong-Wen Deng
I Tssamardinos
J Hua
J Khan
J Weston
Ji-Gang Zhang
JW Lee
K Fukunaga
K Tumer
K Yang
KY Yeung
L Devroye
L Yu
M Chow
M Dash
M Dettling
M Dettling
M Wang
M Xiong
MA Shipp
P Baldi
PA Devijver
R Blanco
R Diaz-Uriarte
R Diaz-Uriarte
R Schalkhoff
RO Duda
S Dudoit
S Mukherjee
S Singh
S Varma
T Golub
T Jirapech-Umpai
T Li
TH Bo
U Alon
X Liu
Y Lee
Y Li
ZY Wang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy. Results In this study, we propose a new method, Based Bayes error Filter (BBF), to select relevant genes and remove redundant genes in classification analyses of microarray data. The effectiveness and accuracy of this method is demonstrated through analyses of five publicly available microarray datasets. The results show that our gene selection method is capable of achieving better accuracies than previous studies, while being able to effectively select relevant genes, remove redundant genes and obtain efficient and small gene sets for sample classification purposes. Conclusion The proposed method can effectively identify a compact set of genes with high classification accuracy. This study also indicates that application of the Bayes error is a feasible and effective wayfor removing redundant genes in gene selection.</p

University of Missouri: MOspace

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central